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Abstract 

Graphs are called navigable if one can find short paths through 
them using only local knowledge. It has been shown that for a graph 
to be navigable, its construction needs to meet strict criteria. Since 
such graphs nevertheless seem to appear in nature, it is of interest to 
understand why these criteria should be fulfilled. 

In this paper we present a simple method for constructing graphs 
based on a model where nodes vertices are "similar" in two different 
ways, and tend to connect to those most similar to them - or cluster - 
with respect to both. We prove that this leads to navigable networks 
for several cases, and hypothesize that it also holds in great generality. 
Enough generality, perhaps, to explain the occurrence of navigable 
networks in nature. 



1 Introduction 

Motivated by the small- world experiments of Stanley Milgram |16] . and the 
models for social networks inspired by them [20], Jon Kleinberg introduced 
the question of whether graphs can be searched (or navigated) in a decen- 
tralized manner [13]. In particular, he showed that when a grid structure is 
augmented by random edges, whether it is possible to use those edges to ef- 
ficiently route queries from one point in the grid to another depends on their 
distribution. In particular, if each vertex x in a d-dimensional grid is given 
one additional "long range" link to some vertex, beyond those to its nearest 
neighbors, then when the probability of y being the selected is proportional 
to \x — y\~'^, any greedy walk on the resulting graph is expected to complete 
in a number of steps polylogarithmic to the graph size. If the probability 
of y being selected is any other exponent of the distance (in particular 0, 
meaning the long-range link is uniformly selected) then any form of routing 
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which uses only information about the points seen thus far wih require a 
number of steps which is a fractional power of the graph size. 

A natural question following from Kleinberg's results is to ask if there is any 
dynamic which might cause the frequency of edges in naturally occurring 
graphs to have the sought relationship with their length. Several empirical 
studies of social network data following Kleinberg have observed just such 
a relationship [T] |15j . making it plausible that such a dynamic may exist, 
but it has not been identified. 

In this paper we observe that the desired edge distribution arises naturally 
in another probabilistic model, that of best-yet sampling from a population, 
and use this to show how navigable networks may arise when vertices belong 
to two independent spaces and tend to cluster in both (in social network 
terms, these may be identified with the physical world and metaphorical 
space of "interests" - people tend to be befriend those who are close in 
either sense.) The resulting spatial random graph, which we dub the double 
clustering graph, turns out to be navigable with respect to both spaces. 

1.1 Previous Work 

The original questions about navigability in a geographical setting were 
posed and answered by Kleinberg in [13] and [12]. Later, Kleinberg |14j 
and Watts et al. [19] independently proposed similar models based on the 
categorization of ideas or characteristics. The latter paper includes the idea 
that vertices may be similar in several independent spaces, but does not 
discuss how this might lead to the desired edge distribution. Fraigniaud [9] 
went further and discussed augmentation in more general settings based on 
tree-decompositions of general graphs. 

Some conceptually different work has been done previously to try to explain 
the emergence of Kleinberg type edge frequencies. In particular, [3] [18] and 
|17| propose graph rewiring processes which seem to create navigable net- 
works in their stationary state. These may help explain how such networks 
arise under some circumstances, but are not always an easy fit with observed 
reality, and have so far eluded complete analysis. [8] shows a form of navi- 
gable augmentation that depends on little knowledge of the base graph, but 
this algorithm is complicated and does not give an intuitive reason why the 
desired edge distribution should arise. 

1.2 Contribution 

We characterize our contribution as follows: 



2 



• We introduce the "double clustering" graph construction. This is a 
simple rule for constructing a graph between a set of vertices with po- 
sitions in two different spaces, so that they tend to connect to those 
nearest in both. Double clustering can be seen as a spatial or combi- 
natorial construction depending on whether the points are originally 
placed in graphs or metric spaces. 

• We show analytically that in several cases double clustering leads to 
navigable graphs. 

• We hypothesize that this holds for a much larger class of such graphs, 
something we illustrate with simulation of several relevant sub-models. 

2 Navigable Graphs 
2.1 Decentralized Routing 

Let G = iy,E) be a connected finite graph of high (some power of \V\) 
diameter, and let the random graph G' be created by addition (augmenta- 
tion) of random edges to G. It is well known, see for instance [3], that the 
diameter shrinks quickly to a logarithm of |y| when random edges are added 
between the vertices. Navigability concerns not a small diameter, however, 
but rather a stronger property: the possibility of finding a short path be- 
tween two vertices in G' using only local knowledge at vertex visited. By 
local knowledge, one means that each vertex knows G, but does not know 
which random edges have been added to any vertex until it is visited. The 
exact limits of such decentralized routing algorithms have been discussed 
elsewhere [l3] [2], but since we are interested only in upper bounds, we will 
define only the subset of such algorithms of interest to us. When routing 
from for some vertex z, in each step we will select as the next vertex in 
the path a G'-neighbor of the current vertex, x. This choice will be made 
entirely as a function of each neighbors G-distance to and nothing else. 
All such algorithms are decentralized by Kleinberg's definition. 

The most direct decentralized algorithm, and the most important, is greedy 
routing. In greedy routing, the next vertex chosen is that neighbor which is 
closest to z in G (with some tie- breaking rule applied). Note that both the 
original and augmented edges can be used, but because the choice is only 
optimal with respect to G, the path discovered by greedy routing will seldom 
be a minimal path in G' . In one case below we will modify the routing to 
divert from a greedy choice slightly for technical reasons, but the principle 
is still the same. 

We start with G as a d-dimensional n-grid (that is F = {1,2, . . . ,n}'^ and 
there are edges between adjacent vertices) and independently add a single 
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directed edge from each vertex to a random destination. The long-range 
connection is added such that for x,y £V, and some a > 

Pix^y)= ^_ (1) 

'l'a,n\X y\ 

where x ^ y is the event that x is augmented with an edge to y and |x — y| 
denotes distance in Z'^. ha^n is a normahzing constant. 

The by now well known result of Kleinberg is that when a = d, greedy 
routing between any two points in V takes O(log^n) steps in expectation, 
while for any other value of a decentralized algorithm creates routes of 
expected length at least Q{n^) steps for some s > (where s depends on 
the dimension but not the algorithm chosen). 



2.2 Doubling Dimension and More General Augmentation 

It should be noted that if the graph is a d-dimensional grid as above, and 
for X G y Br{x) = {y ^ V : \y — x\ < r], then oc r"^. For a = d ^ 

can then be interpreted as 

This general principle, that under navigable augmentation the probability 
that X links to y should be inversely proportional to the number of vertices 
that are closer to x than y has been observed to hold across a wider class of 
graphs then just the grids, see e.g. [H] [7] [l5], and seems to be the general 
principle behind navigability. It leads directly to our first construction. 

A natural generalization to grids is to study graphs which are naturally 
grid- like. Let Br{v) be as above, but using graph instead of grid distance. 

Definition 2.1. A family of graphs has bounded doubling dimension if 

there exists a family wide constant c such that for all G = {V, E) in the 
family, u,v , and r > 1 

Br{u) C B2r{u) =^ \B2r{v)\ < c\Br{u)\. 



The commonly used doubling dimension of the family roughly corresponds to 
the log2 of the smallest such c. This is not the widest class of graphs where 
navigable augmentation is possible, [7] and [10] have shown that families 
with a sufficiently slowly growing dimension can still be made navigable, 
but it provides a a good compromise between generality and convenience 
for us. 
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In the constructions below, we will augment the base graph with more than 
one long-range edge per vertex. In general, a k edge augmentation is ex- 
pected to give 0(log^n//c) expected greedy routing time. Our construc- 
tions will generate close to log n edges per vertex in expectatior0, and thus 
have routing time O(logn). They remind most of previously explored fi- 
nite long-range percolation models, for which the diameter is known to be 
O (log n/ log log n) [S]. 



3 The Independent Interest Model 

We start by introducing a conceptually simple model. Compared to our main 
model below, it is not a particularly interesting model of networks dynamics, 
and not particularly realistic, but serves to illustrate the reasoning we will 
use later. 

Let Xi, X2, . . . , Xn be n random variables drawn from an exchangeable joint 
distribution such that P(Xj = Xj) = {) for i ^ j. It is well known in this 
case that the probability that for any fc, P(Xfc > Xj for all j < k) = 1/k, 
and that this event is independent for each k. This fact, combined with ^ 
motivates the following graph model 

Definition 3.1. (The Independent Interest Graph) Let G = {V,E) be a 
graph, and for x,y G V let d{x,y) be the graph (geodesic) distance between 
them. 

Create the long range links as follows: For each vertex, independently create 
an exchangeable sequence of random variables {Xy)y^v The add an edge 
from X to y if: 

Xy > X^ for all z ^ X : d{x, z) < d{x, y). 

In the social network metaphor, each Xy in the construction above can be 
seen as x's interest in y, and the construction simply means that x befriends 
each y who is more interesting to him than any closer person. In other words, 
starting from his own position, x searches outwards for friends, befriending 
each new person he meets if that person is more interesting to him then the 
people he already knows. In reality, of course, it is unlikely that the interest 
levels Xy would be independent for each x and y - in particular, one would 
expect a high correlation between Xy and Xx. This fact will inspire our 
later models below. 

degree going to infinity may seem unreafistic in terms of social networks, but note 
that log(6 billion) ~ 22.5 which is probably considerably less than the average number of 
acquaintances a person has in the real world for most definitions of the word "acquain- 
tance" . Our models can be given bounded degree by simply thinning the edges (removing 
each edge independently with probability 1 — l/log(n)). 
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That the independent interest graph is navigable in fact follows from the 
observations above and previous results, but for illustration we will give a 
direct proof here. 

Theorem 3.2. For any family of connected graphs with bounded doubling 
dimension, the expected greedy path between any two vertices has expected 
length O(logn), where n is the size of the graph. 

Proof. Let z & V he the target vertex. We follow the standard method: 
divide V into O(logn) phases, with the i-th phase defined as the set of 
vertices x such that 2*~^ < |x — 2;| < 2*. At a vertex x in the i-th phase, for 
i > log log n, let A be the event that x has a shortcut to a lower phase, that 
is 

A = {x-^y.y £ B2^-l{^)} = {x B2^-l{z)}. 

Let 

w = argmax (X^) 

■"6-S(3/2)2i(^) 

By construction, x has a link to w, so A will occur ii w B2i-i{z) C 
-^(3/2)2' (^)- That the family has bounded doubling dimension thus means 
there is a constant c such that i?(-3/2)2' (^)/-^2'-i (-2^) — c^- Since each vertex 
in the larger ball is equally likely to be the most interesting. 

PiA) > 1 

independent of n and i. If A does not occur, then in the next step we are 
by necessity not further from z (nearest neighbors in base graph are always 
connected), and because the edges are chosen independently, A occurs at 
the new vertex with at least the same probability. Therefore the expected 
number of steps until A occurs, an upper bound on the number steps in a 
phase, is at most c^. 

For each sufficiently big phase, we thus have a constant bound on the ex- 
pected number of steps. Since the destination of the edges at each vertex 
are independent of the previous path taken by the query, it follows that 
the expected number of steps in such phases is at most the sum over all of 
them, which is O(logn). Only 0(log?i) points in smaller phases remain, so 
the result holds. □ 

4 The Double Clustering Model 

Our main model of interest is conceptually similar to the independent in- 
terest model of the last section, but rather than letting each vertex' interest 
in each other vertex be an independent random variable, we let each vertex 
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also live in a second space, and let the interest between two vertices be their 
proximity in that space. In a social network, this would can be represented 
by each individual not only living somewhere in the physical world, but 
also having some position in a less clearly defined "space of interests" (his 
job, activities, etc.). People who live close to one another tend to become 
acquainted by "default" , while people befriend those far away only if they 
interests that agree to some extent. 

In the constructed graph, each vertex is thus connected to every other vertex 
that is at least as "interesting" as any other that is at most as "far away". 
Formally, let a distance function be a real valued kernel d{x, y) such that 
d{x, x) = and d{x, y) + d{y, z) > d{x, z) but which is not necessarily 
symmetric. The most general definition of such a graph is then: 

Definition 4.1. (The Double Clustering Graph) Let {xi}"^^^ and {yi\^^i he 
set two sets of points in possibly different spaces Mi and M2 with distance 
functions di and ^2 respectively. The graph G = {V, E) is constructed as 
follows: 

. ^ = {1,2,. 

• {i,j) e E if for all k eV, k ^ i,j: 

di{xi,Xk) < di{xi,Xj) d2{yi,yk) > ^2(2/1,%) 

Note that the two sequences are symmetric in the definition, and that G 
contains a nearest neighbor graph for both point sets. If, in particular, we 
let the Xi and yi be the vertices of two graphs Gi and G2, letting di and 
^2 be graph distance, we may see the construction as an augmentation of 
either one to create a denser graph. 

Since we are interested in probabilistic models, we want to let (xj) and (yj) 
be random points. One way of doing this is to let vr be a random permutation 
of [n], and then letting yi = x^(^,iy In the graph case, this corresponds to: 

Definition 4.2. (Random Double Clustering Graph) For a vertex set V , 
let Gi = {V,Ei) and G2 = (F, £"2) be a given graphs. Let tt be a random 
permutation ofV, and construct G' = (y,E') by letting {u,v) £ E' if for all 
w € V , w u, v: 

di{u,w) < di{u,v) =^ d2{'n:{u),7r{w)) > (i2(vr(n), 7r(u)) 

where di and d2 graph distances in Gi and G2 respectively. 

Note that every edge added in the construction has a direction, though in 
many cases (such as nearest neighbors in Gi and G2) edges in both directions 
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Figure 1: A double clustering graph of 100 vertices. Each vertex has a 
random position in a two-dimensional physical space ([0,1.33] x [0,1]), as 
well as a in a three-dimensional color space (RGB) ([0,1]'^), both using 
Euclidean distance. 
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will be included. One may choose to see the resulting graph as undirected 
by simply removing directionality and duplicated edges. For the sake of 
bounding the routing time, it is advantageous to preserve directionality and 
route using only outgoing edges. 

In light of this, and before proceeding to analysis, we note that the con- 
struction G' works equally well if Gi and G2 are directed graphs. 

5 Analysis of Double Clustering 

We will analyze special cases of Definition 14. 2[ We start by proving that 
greedy routing takes only O(logn) steps in expectation when we construct 
a double clustering graph using two directed cycles. Augmenting a directed 
cycle is the most basic form of Kleinberg type navigability, and has been 
extensively investigated in the case of independent augmentation (see e.g. 
P]), but of course is not a good model for most real world scenarios. 

More general models, in particular where the first space is not directed, are 
more complicated. In this case the probability of finding a link that halves 
the distance to the destination is not independent of the previous route 
taken. This can be seen in the simulations below, where double clustering 
graphs have slightly longer greedy paths than the equivalent independent in- 
terest graphs, though seemingly only by a constant. We attempt an analysis 
of one class of such models, where the first graph may take a more general 
form, and the second is an undirected cycle (in particular, this includes the 
case of two undirected cycles), but to do so we are forced to modify the 
routing used somewhat. The resulting algorithm is still a form of decentral- 
ized routing by Kleinberg's definition. Using this, we are able to show that 
routing takes a polylogarithmic number of steps, a somewhat worse bound 
than what we expect is true. 

We conjecture that double clustering can be applied to just about any graph 
(see the conclusion), but can not yet prove it. 

5.1 Two Directed Cycles 

Let Gi and G2 in Definition [32] be two cycles of n points, that is the directed 
graphs with vertex set Vi = V2 = {0, 1, . . . , n — 1} and both Ei containing 
an edge from n to n + 1 (modulo n) for each u € T^. We will refer to this 
special case as the Double Cycle Graph. It constitutes the simplest case of 
double clustering. 

Below, d{x^ y) = y — X mod n will be graph distance in the cycles, and dj^ 
will be corresponding distance function on the permuted cycle ((iw(x,y) = 
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d(7r(x), 7r(y))). We will discuss greedy routing using d, but by symmetry the 
same results hold for d-,^. 

Note from the definition that G contains a link to the point y such that 
d(x, y) = 1 (the next vertex in the cycle) and the point z such that c?^(2;, z) = 
1 (the next vertex in the permuted cycle). 

Addition of vertex values below is always modulo n, but the notation is 
suppressed for readability. 

Lemma 5.1. For w ^ z £ V , let w' £ V he the vertex that w routes to 
when d- greedy routing for z. Then: 

• w' lies inclusively between w + 1 and z in the cycle (that is d{w' , z) < 
d{w,z)). 

• w' lies inclusively between w+1 and z in the permuted cycle (dT^{w' , z) < 
dTr{w,z)). 

Proof. The first statement is obvious from the definition of greedy routing, 
and the fact that w ^ w + 1 so there is always a choice which approaches z. 

To prove the second statement, assume that w' is not between w + 1 and z in 
the permuted cycle. This means that d^^iw, z) < ^^^(if , w'). Let A be the set 
of points inclusively between w' + l and z (that is ^ = {w' + 1, w' + 2, . . . , z}). 
Define q as the first point in A such that dT^{w,q) < dTt{w,w'), noting that 
at least one such point, z, exists. But by construction, and since w ^ w' , q 
must be closer to w in the permuted cycle than any vertex between w and 
itself, and thus w ^ q. But if this were the case, w would have routed to q 
and not w\ which is a contradiction. □ 

Corollary 5.2. For any permutation tt, a d-greedy path from any vertex y 
to any other z in the double cycle monotonically approaches z indn, likewise 
a dT^ -greedy path monotonically approaches z in d. 

In light of the corollary, it might seem that greedy routing with respect to 
d and d-,^ would produce the same paths. In fact, this is not the case, which 
we prove as an aside: 

Lemma 5.3. There exists a permutation vr such that greedy routing from 
some vertex y to some vertex z with respect to d and d-,^ produces different 
paths. 

Proof. Let vr, y and z be such that there are exactly two vertices xi and X2 
that lie between y and z in the cycle {d{y, z) > d{xi, z) > d{x2, z)), and also 
lie between y and z in the permuted cycle. Let xi, X2 appear in the opposite 
order the permuted cycle {d{y,z) > dT^{x2,z) > dT^{xi,z)). 
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Note that by construction, y will have edges to both xi and X2 in the double 
cycle graph, because xi is closer in the d-,^ then any d closer point to y, and 
likewise for X2 (in particular, it is closer to y than xi in d^). However, when 
greedy routing with respect to d for y will choose X2, while when greedy 
routing with respect to d,r, it will choose xi. □ 

Marginally, under a uniform random choice of vr, the probability that x ^ y 
in the double cycle model is exactly l/d{x, y) as it should be for navigability. 
However, like in the all the double clustering graphs, the random edges are 
not formed independently, so the situation is different from previous results. 
We will see, however, that in the case of a the double cycle, the monotonicity 
of the routing path also in d^^, as proved above, makes the routing events 
independent (in a sense which will be shown precisely below): the knowledge 
provided by previous routing steps is always "behind us" in the permuted 
cycle. 

Theorem 5.4. For any two points y,z £ [n], the greedy path from y to z 
in the double cycle graph formed by a uniformly random permutation vr has 
expected length O(logn). 

Proof. The proof method is the same as in Theorem 13.21 thus we will con- 
sider starting in a point x such that r > d{x, z) > r/2 and bound the expect 
number of steps (conditioned on the earlier path) until the route is within 
r/2 ofz. 

Divide the vertices between x and z in the cycle into two equal sized sets R 
and H, so that if d{x, z) is odd 

i? = {x + l,x + 2,...,x + ^i:^4±^} 

If d{x.,z) is even, we let R end at x + ((i(x,z)/2) and H go from there to 
z — 1 so that R and H retain the same size. 

Note that if x -ff, then we can route to a point with distance to z less 
than r/2, and that 

P(x ^ H)= F{d^{x, H) < d^{x, R)) = 1/2 

where d-,^{x,S) means the minimal distance from x to any point in the set 
S. 

Let A be the event that ^^^(x, H) < dTj{x, R), and B be the event that before 
reaching x we greedy routed along the path 

Xi X2 . . . ^ X/^. X 
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for some k and sequence of vertices where d{xi,z) < d{x,z). We will show 
that P{B n A) = P{B n A"), which (since P{A) = PiA")) implies that 
F{B I A) = P{B I A'^) and thus that A and B are independent. 

To do this, we define a bijection between the set of permutations BnA and 
B n A^. For a given ir B A, let n' be vr composed with a permutation 
that flips the positions of the elements in R and H. Clearly, if vr G A, then 
vr' e 

By Corollarv 15.21 d-^ixi. z) > d-,^{x^z) for all the Xi in the definition of B. 
This means that all the vertices m. RU H are further from each Xi than 
X in both d and dir. Thus the internal order of vertices m. RVJ H can not 
affect the edges of the a;,, and if tt € -B, then vr' G i? as well. It follows that 

\B <r\ A\ = \B <r\ A% whence 

P{A\B) = P(^) = 1/2 

for any B defined as above. At each vertex we reach at distance between 
r and r/2 to 2, the probability of having a link to a vertex with distance 
less than r/2 is thus greater than 1/2 regardless of which vertices we visited 
previously. The result now follows as in Theorem 13.21 □ 

5.2 Bounded Doubling Dimension and an Undirected Cycle 

In this section, we let Gi belong to a more general family meeting the criteria 
of Definition 12.11 and we let G2 be an undirected cycle (a one-dimensional 
toric grid). Like in previous cases, we shall bound the expected number of 
steps that it takes to halve the distance to the destination: however, unlike 
in previous cases, the event of halving the distance in each step of greedy 
routing is not independent of the previous path. 

In order to control the dependencies between the edges encountered at each 
step, we introduce a modified routing algorithm we call half -greedy routing. 
When routing for a vertex z and currently at rr, we examine each of x's 
neighbors in the double clustering graph G. If any neighbor w is such that 
di(a;,z) > 2di(t(;,z), then w is chosen for the next step. If no such such w 
is found, X routes to a neighbor w' in Gi such that di{w' , z) = di{x, z) — 1 
(choosing from all possible such w' by some deterministic rule). 

Half-greedy routing thus either takes a "very big step", which immediately 
halves the distance, or a very small step to the next vertex in Gi. Intuitively, 
one may imagine this as a participant in a Milgram style experiment only 
bothering to send the letter by post if he knows somebody very suitable, 
and otherwise just giving it directly to one of his neighbors. The analytical 
advantage of this approach is that while subsequent vertices reached by 
a greedy route do not have independent positions in G2, neighbors in Gi 
(nearly) do. The navigability result thus follows from this lemma: 
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Lemma 5.5. Let ir be a random permutation of [n] and d^^ he circular 
distance under this permutation. That is, for x,y G [n] 

d^{x,y) = mm(|7r(x) - TT{y)\,n- \tt{x) - 7r(y)|) 

Let A and B be disjoint subsets [n], such that \A\ = k and \B\ > qk for 
some q > 0. The elements of A are enumerated ai, 02, 03, . . . , a^. Define a 
random variable r by 

T = mm(t > : (i^(at, ^\{aj) > d-,,{at,B)) 

or T = k if this never occurs. Then, for t < k/5 

P(r > t) < e""^* 

where m = m{q) < 00, a constant independent of n and k. 

We will establish this lemma below. First we show how it leads to the desired 
result. 

Theorem 5.6. In Definition \4.^ let Gi be a connected graph from a family 
with bounded doubling dimension, and G2 be an undirected cycle. Then then 
path though G between any two vertices x and z when half- greedy routing 
with respect to di has expected length O(log^n). 

Proof. Let T be time it takes for half-greedy routing between any two ver- 
tices. We will establish the stronger fact that for n sufficiently large and a 
constant h 

P(r>/i(logn)2)<i^. (3) 
n 

It then follows that 

E[T] < h{lognf(l-^-^)+J^ 
\ n J n 

= 0{\og^n). 

Fix a destination z, and let the phases be as in the proof of Theorem 13.21 
Consider the i—th phase (the set of vertices x such that 2*^-*^ < ^1(2;, z) < 2*), 
where i is such that the phase is "big" , meaning it contains more than log^ n 
vertices. We let A and B from Lemma 15.51 be defined by 

B = ^2^-2(2) 

and 

where the Br{z) are balls with respect to di. We note that (distance below 
always means di except where otherwise noted): 
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1. Each vertex in the i-th phase belongs to A. 

2. All vertices in B are within distance 3(2*"^) of any vertex in the i-th. 
phase. 

3. Every vertex within distance 3(2*~^) of a vertex in the i-th phase is in 
AUB. 

Together, these three facts mean that if a vertex x in i-th phase has a 
randomly assigned position in G2 (the cycle) which is at least as close (with 
respect to the permuted positions in G2) to a vertex in B as any vertex 
other than itself in A, the resulting double clustering graph G will have an 
edge from x into B. 

Now consider half-greedy routing starting from a vertex x in the i-th phase. 
Let the enumeration of A be so that ai = x and each subsequent aj, for j 
up to some i, is the vertex where aj-i would route if a "very big step" was 
not found, is the first vertex encountered so that it has a Gi neighbor in 
a lower phase, after this we may order the elements of A as we wish. 

Since each vertex in B is less than half as far from z as those in phase i, 
the random variable r from Lemma 15.51 thus dominates the time we spend 
in the i-th phase after starting from a given vertex. 

Let b = 2/m, where m is the constant from Lemma 5.5 with q = > 
because of the bounded doubling dimension. Note that q, and thus 
m and b, are independent of which phase we are in. Let El. he the event 
that we spend more than blogn steps in the i-th phase after starting from 
a vertex x in the phase. Lemma [531 and the argument above gives 

Since the probability is simply uniform measure of permutations of [n], this 
means that starting for any given vertex x in the phase, routing to the 
next phase will take more than 51ogn steps in less than 1/n? of all the 
permutations. Since the graph is dependent, where we enter the phase may 
depend on the permutation, but the very worst case scenario is that we 
always enter the phase at the vertex where it will take the most steps to 
route to the next. Let be the event that starting from any vertex in the 
i-th phase, we spend more than 61ogn steps in the phase. 

P(-E'i) = P (Ui_th phase-^x) 
i-th phase 

where the last inequality holds because every phase trivially contains at 
most n vertices. 
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There are at most log2 n "big" phases, so the probabihty of spending more 
than 61ogn in any of them is less than log2 n/n by another union bound. 
Since the number of vertices in the "small" phases is less than 4 log^ n, ([3]) 
follows with h = 6/ log (2). □ 



The remainder of this section is a proof of Lemma 15.51 In order to estab- 
lish the Lemma, we will make use of something we call the toy train track 
construction of a random permutation. We equate each vertex on the cycle 
with a curved segments of track in toy train set. These segments can be at- 
tached to each other to make longer section^, and when all the n segments 
are attached they form a complete circle. All the pieces start in a bin, and 
are either red (corresponding to vertices in A), blue (corresponding to ver- 
tices in B), or gray (corresponding to vertices in neither set). We build the 
random circular track, starting as follows: 

1. We pick up the segment of track corresponding to ai from bin, this is 
our current section. 

2. Uniformly select from the remaining pieces a segment x to attach 
clockwise from the current section, and then another segment y to 
attach counterclockwise from the section. 

3. As long as neither the x nor y picked up in the last step is a blue or 
red piece, continue we draw two new pieces to attach to the section. 

This continues until a red or blue segment has been attached at one or both 
ends of the section. At this time the first construction stage is completed, 
and we put the constructed section of track back in the bin together with 
the other pieces. If at least one end was blue, then the building phase 
terminates. 

If no blue piece was found, we start the second construction stage, we try 
to take out 02 from the bin. If 02 cannot be found on its own (it was part of 
the previous section), then the stage ends immediately. If it is found, then 
we proceed to build a new section starting from it as in the first stage, but 
this time we stop whenever a blue segment, a red segment, or the previously 
constructed section of track is attached to 02 's section. At the end of stage 
two, we put 02 's section back in the bin as before (if one was built), and, 
unless a blue piece was found, continue to stage three, which we complete 
in a similar manner. 

^Our chosen vocabulary is to consistently use segment for each element, and section 
for connected collections of segments. 
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If at any time all the pieces have been added to one section the building 
phase terminates, and likewise if we run out of red pieces to start from. 
When the building phase has terminated, we attach all the sections and 
segments in the bin in a random permutation (draw one at a time, and 
attach clockwise from the last) to form a completed circle. 

Let X be the number of construction stages. We make three claims about 
this construction which together establish Lemma l5.5t 

1. The circle of track segments created is a uniformly random circular 
permutation. 

2. X >T (as defined above) for the corresponding permutation. 

3. For t < k/5, P{X > t) < e~"*^, where m depends on q but not k and 
n. 

Proof of Claim 1: This follows from the conditional distribution of random 
permutations. If one conditions on two segments si and S2 being next to 
each other, then resulting is distribution is a random permutation of the 
remaining segments, with the siS2 section uniformly inserted. This is equiv- 
alent to the returning of the section to the bin. Likewise, another section 
S3S4 would simply be uniformly inserted again. The claim follows from a 
series of such arguments. 

□ 

Proof of Claim 2: This is almost immediate. If we encounter a blue piece 
during construction stage i, then all the segments closer to Oj than that 
piece were gray, hence dT^^ai^ A\{ai\) > dT^{ai,B). If we don't find a blue 
piece in any construction stage, then X = k which is an upper bound on r. 

□ 

Proof of Claim 3: Let Ei be the event that we encounter a blue piece in the 
i-th construction stage. X is min(i : Ei occurs ) or A; if this is undefined. In 
the first stage, there are k + qk pieces for which we terminate, and qk are 
blue, so P(-Ei) > q/{l + q) =■ p (in fact greater). 

Conditioned on Ei not occurring, we let ei and 62 be the two end pieces, 
and note that 

P(^2 I El) = P{E2 I El and 03 / ei, e2)P(a2 / ei, 62 | E^). 
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If 02 7^ ei or 62, then second construction stage could proceed. However, 
since Ei did not occur, this means that we removed at least two red segments 
from the bin, and added only one new terminating section. Thus: 

P{E2 I and 02 / ei, 62) > P(^i) > p. 

We now have to lower bound P(a2 7^ ^1,62 \Ef). The worst case is that 
both ei and 62 are red, in which case we drew 2 red segments out of A; — 1 
possible. 

P(a2 /ei,e2|S?) > 1 ^ 



k-1 



Using similar arguments (and rather conservative estimates), it follows that 
for i < k/3. 

whence 

t 

V{X>t) = J{-p{Et\El,El,...,EU) 



i=l 
t 



< 



n 1 



i=l 



k-3{i- 1) 
' k-{i-l) 



< (l-|)* = e-'"* 



where m = — log (l — f) ■ 



□ 



6 Simulations 



Simulations support the conjecture that double clustering creates navigable 
graphs over a larger span of structures. In cases where the first graph is 
not a directed cycle, one can see that double clustering gives slightly worse 
performance than when the edges are independent, as is expected. However, 
the simulation data still strongly indicates a logarithmic growth of path- 
length with the size of the graph. 



6.1 Combined Greedy Routing 

Since the double clustering construction is symmetric, it should create equally 
navigable networks with regard to both spaces. Thus we can perform greedy 
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Figure 2: Performance of double clustering graph versus the independent 
interest model using two undirected cycles. 

routing in the double clustering graph with respect to either distance func- 
tion (which will sometimes lead to different results, see below.) 

A direct consequence of this is that we may try to route with respect to to 
both distance functions, using at each step that which seems most profitable. 
As above, assume that z is the target of the route. 

• At vertex x, we calculate m\ = di{wi^ z), where wi is the neighbor of 
X which minimizes this. Similarly, calculate m2 = d2{w2,z). 

• Let ni be the number of vertices within mi of z in the first space (Mi) 
- if the space if homogeneous this is the volume of a ball of diameter 
mi. Let n2 be the equivalent for 7712 and the second space. 

• Route to 11)2 if m2 is smaller than mi, otherwise wi. 

We simulate combined routing as well as normal greedy routing for the 
models below. In these models it seems that the benefit of using this method 
regains that lost by the dependencies in the double clustering construction: 
combined greedy paths are shorter than greedy paths in the independent 
interest model of the same size. 

6.2 Two Undirected Cycles 

The simplest undirected double clustering model is the case of Definition 
14.21 where both Gi and G2 are undirected cycles. A bound on half-greedy 
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Figure 3: Performance of double clustering using letting the first space (the 
geography) be a two dimensional grid, and the second a tree with the points 
as leaves (a categorization). 



routing in this model is derived above, but we can simulate also the normal 
greedy algorithm. The results illustrated in Figure [2]- at all sizes simulated 
greedy routing with respect to either cycle produces slightly longer paths 
than the equivalent independent model, while combined greedy routing pro- 
duces slightly shorter paths. All lines seem to follow strictly logarithmic 
growth. 



6.3 A Grid and a Tree 

Kleinberg's original work consisted started with a two dimensional grid as 
the base graph and distance function, inspired, one expects, by the dimen- 
sionality, if not population distribution, of the surface of the earth. Later 
he [H] and Watts et al. [19] proposed equivalent models based on letting 
vertices have positions at the leaves of a tree. The tree represents a hierar- 
chical model of information, ideas, interests or other characteristics, and the 
distance function is standard tree distance: d{x, y) is the depth of smallest 
subtree containing both. The criteria for navigable augmentation in these 
cases is consistent with ([2|). 

A natural attempt at a realistic double clustering model is to combine both 
of Kleinberg's models - we let the first space be a grid, and the second be 
a hierarchical tree structure (in our case, a binary tree, though any other 
branching is possible). We note that while the tree distance provides a well 
defined metric, this space can not be seen as a graph, so this is a sub-model 
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of Definition l4.1l rather than Definition [121 A problem with the more general 
model is that greedy routing is not necessarily always successful: we may 
reach a vertex other than the destination with no neighbor which is closer to 
the destination than itself. This can occur in this model when routing with 
respect to the tree, or the combined distance, but not in the grid (where links 
to neighbors in all directions always exist) - in our simulations we simply 
fail and discard such route^. 

Figure E] shows a simulation of this situation. Routing purely using the 
tree shows slightly worse performance than routing using the grid, and as 
such the advantage of the combined model is less than above (for the largest 
data-point simulated it was, in fact, nonexistent). As expected not all routes 
were successful - at a network size of 2^^ about 0.8 of the routes using only 
the tree, and 0.9 of the routes using the combined routing were successful. 
Another effect of the tree is that the degree of the double clustering graph 
is much higher (since many vertices have the same distance, and we only 
require them to be as close as any previous.) 

6.4 Continuum Models 

Discrete and grid based models cannot realistically describe most naturally 
occurring networks: especially social networks which are characterized by 
individuals placed randomly in continuums and often with heterogeneous 
population density. Continuum models for navigable networks have been 
explored by Pranceschetti and Meester [11] and [6] as well this author |17j . 
and Liben-Nowell et. al [T^ has proposed a model based on real data that 
includes non-uniform Poisson density of positions. Figure [U shows a simu- 
lation a continuum model with 100 vertices. 



7 Conclusion 

We have introduced a new form random graph construction, which when 
combined with a random permutation of the points used to create the graph 
gives rise to networks with navigable properties. These graphs are con- 
structed from a single natural principle, and may help explain why networks 
of this type occur in real world networks. 

While we have established navigability under a several cases, the analysis 
presented here is far from complete. In a sense it is unfortunate that we 

^When routing for we do allow x to route to a vertex at the same distance as itself if 
no better choice, but (so as to not cause loops) we forbid routing to a vertex already in the 
path). This is important since tree distance has the property that there are a very large 
number of vertices at the same distance from any other, a large majority of the routes fail 
when routing for tree distance if this is not allowed. 
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are able to analyze an unrealistic model (the directed cycle) for an intuitive 
clear routing principle, while the proof for the more realistic model requires 
somewhat contrived routing. Theorem 15.61 also has an extra log n multiple 
included for technical reasons in the proof: we believe strongly that neither 
this term (the actual bound is O(logn)) nor the use of half-greedy routing is 
actually necessary. In fact, based on the absence of any opposing evidence 
in simulations or otherwise, we believe 

Conjecture 7.1. Let T\ and J-^ be two families of graphs with bounded 
doubling dimension (not necessarily with the same constants). For any two 
graphs Gi € Ti and G2 G ^2 of size n, the doubling clustering graph from 
Definition \4-^ allows greedy routing in O(logn) expected steps. 

Proving this in general is difficult since the structure of the two base graphs 
control the dependence between the edges in the construction. We are how- 
ever hopeful that progress can be made in this direction. Making rigorous 
stronger conjectures about Definition 14.11 is also difficult since monotonic 
greedy paths between vertices may not always exist, but we believe that the 
resulting graph will be navigable whenever such augmentation is possible. 

Beyond this, the double clustering graph, as a new form of graph construc- 
tion, has not been analyzed for questions other than navigability. Questions 
such as connectivity, diameter, and edge length remain open in some or 
all cases. And, finally, the question of how well double clustering actually 
matches the real world has not been investigated. 
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